The Performance Evaluation of Data Caching in the Grid Environment
نویسندگان
چکیده
High latency network and large data sets produced by the applications in data grid environment can put heavy burden on network resources and cause applications to take longer times to finish. Data caching is one of the methodologies which can improve the efficiency of data usages in the distributed environment by storing popular data on the places closed to the clients. It is quite obvious that caching can reduce the bandwidth usages and minimize access latencies [1]. Although there is a study of caching in grid environment, the study focused mostly on replacement policy[2, 3, 4]. Thus, the behavior and the effect of data caching in grid environment is still relatively unknown. In this paper, we study the impacts of integrating block-based data proxy in the grid environment, we focus on the affect of the data proxy’s cache size to file access latencies, hit ratios, network bandwidth reductions and the total execution time. The experiment simulates Grid Datafarm’s data access behavior[5] and performs in NS-2[7]. The results indicate that grid caching can reduce the network bandwidth consumption and improve performance in the grid environment.
منابع مشابه
Dynamic Replication based on Firefly Algorithm in Data Grid
In data grid, using reservation is accepted to provide scheduling and service quality. Users need to have an access to the stored data in geographical environment, which can be solved by using replication, and an action taken to reach certainty. As a result, users are directed toward the nearest version to access information. The most important point is to know in which sites and distributed sy...
متن کاملA New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کاملReliability and Availability Improvement in Economic Data Grid Environment Based On Clustering Approach
Abstract - One of the important problems in grid environments is data replication in grid sites. Reliability and availability of data replication in some cases is considered low. To separate sites with high reliability and high availability of sites with low availability and low reliability, clustering can be used. In this study, the data grid dynamically evaluate and predict the condition of t...
متن کاملA Model for Locating Services in Grid Environment
A model for locating services in grid is described here. It integrates the Grid framework OGSA (Open Grid Services Architecture) [1], using VO (Virtual Organization) [2] concept to divide logic grid services into different organizations based on its establishing purpose and requirement on resources sharing and services providing. The CARP hash-based information caching mechanism and a hierarchy...
متن کاملData Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005